2012 US Presidential Election Campaign Contributions – Part 2: by Andrew Lavers

The 2012 presidential election was on by Barack Obama of the Democratic Party running aginst Mitt Romney of the Republican party. The genrela election followed the earlier primary election phase which was contested in the Republican party. No Democrats contested Obama in a primary for the Democratic nomination.

Campign finance regulations require the recording and publishing of campaign contributions. This report describes an analysis of that data.

This project uses a sample from the very large full countrywide FTP dataset to make analysis manageable. Using sampling rather than a single state data set enables some interesting state comparisons to be made. All data below is reported for the SAMPLE, and has NOT been adjusted for the full dataset

Load the data

IMPORTANT: This is a 5% sample from the complete data set. This should be sufficient to represent the full data set but actual totals of contributions will not represent the totals. Any totals reported here have NOT been adjusted for sampling.

The data to be loaded is in file campaign_5.csv. This is a munged data set based on the presidential campaign ALL states data set. The munging is documented in the separate [file:AndrewLaversCampainMunge.html] produced from [file:AndrewLaversCampainMunge.Rmd]

## [1] "Analyzing  259760 rows from /Users/alavers/Documents/Udacity/Data Analysis with R/P3/campaign_5.csv"

Univariate Plots Section

Contribution Amounts

Investigate contribution amounts and establish a category that will make analysis of contributions more consistent.

At first thought, contribution amount is a continuous variable, but in reality there are very distinct buckets that show in the chart as spikes at 25, 50 100, 250, 500, 1000, 1500, 2000, 2500.

Clearly there are many small contributions, which is confirmed by the following basic statistics of the contribution amount.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.01   25.00   50.00  180.00  100.00 2500.00

Based on the above chart and some experimentation, the contribution was encoded in the distinct buckets as a new category variable named contb_receipt_amt_category.

The levels of this new category valriable can be clearly seen in the followign chart:

After some experimentation the above buckets produced the expected falling counts by increase in contribution amount. As is expected, the number of contributors declines as the contribution amount increases which reflects the overall country wealth demographics.


Contributions by Candidate

A basic look at the difference by candidate.

In the above chart, the two final presidential candidates for the general election are orders of magnitude greater than the primary candidates, so further analysis of the primary candidates is not likely to be that interesting.


Contributions over time

Every contribution in the data set has the date of the contribution so we can do some analysis of the contributions over time.

Contributions accelerate approaching the early November election date, as can be seen in the above chart. It will be interesting to see the different pace of Republican versus Democratic contributions.


Contributions by State

States are an important factor in the presidential election because of the way the Electoral College of state representatives actually elects the president.

The above counts by state show a few large states as frequent contributors.

Swing states are important because the president is elected through the Electoral College. States that are evenly split between the parties can “swing” the election because they vote all their Electoral votes based on the results with in the estates. See 2012 swing states

While the above chart shows that there are substantial differences in contributions when comparing Swing States to Non-swing states, the overall state populations are most likely masking the effect. Very populous states such as NY, CA and TX are not swing states.


Contributions by Party

A basic investigation of party differences.

The counts by party above are substantially different. We should later investigate the relative size and number of contribution by party.


Univariate Analysis

What is the structure of your dataset?

The data set to analyze has individual contributions to 2012 presidential candidates. Each contribution has:

  • Contributor - Name and address

  • Contributor - Occupation (Not useful because it isn’t normalized. Many different equivalent. There may be a few interesting high-frequency common items like LAWYER, PHYSICIAN)

  • Contribution - Date, Year month, amount

  • Commitee and Candidate - Committee ID, Candidate ID and Candidate Name

  • Election type - P2012 for primaries and G2012 for general election

  • Form type and transaction id - Not used in this analysis

  • Party Affiliation - Republican or Democratic

What is/are the main feature(s) of interest in your dataset?

  • Individual contribution - Each individual contribution is represented as a row so the contribution counts

  • Candidate (cand_nm) - Obviously the presidential election is about the person that will be president

  • State (contbr_st) - In presidential elections, states vote with Electoral college ballots, so votes within a state matter. See for example http://en.Wikipedia.org/wiki/Electoral_College_%28United_States%29. The state identifiers in this data set includes identifiers for non voting territorial possessions (e.g Guam, US Virgin Islands)

  • Contribution Amount (contb_receipt_amt) - The is the dollar amount of the contribution is the most interesting item to analyze. This part of the data set was limited to include contribution under the 2012 contribution limit of $2500. There varying reports of whether contributions under $200 must be reported. About half the contributions in this data set are $50 and under. We can analyze the differences and totals of contributions to reach broad conclusions, but these will not represent the full population of contributions.

What other features in the dataset do you think will help support your

investigation into your feature(s) of interest?

  • Date (contb_receipt_dt, contb_receipt_ym) - Looking at contributions over time may prove interesting. The Republicans had a large primary field while the Democrats presented a single candidate, the incumbent president. Time may show us when the primary candidates dropped out.

  • Occupation (contbr_occupation) - While this may be interesting, the values are not normalized and these cant be effectively compared across all occupations. There are a few discrete occupations that may be interesting such as LAWER, PHYSICIAN, TEACHER.

  • Employer (conbr_employer) - This will be very interesting to find employers with many contributors. However this may be better left for other type of analysis – visualization with plots may not effectively tease out isolated hot spots. One category appears here “RETIRED”, “HOMEMAKER”, “UNEMPLOYED”, “NOT EMPLOYED” that may be useful for broad categories but there are many “INFORMATION REQUESTED” which indicates missing data.

Did you create any new variables from existing variables in the dataset?

  • Party (party) - Party helps pool the early contributions that went to multiple Republican primary candidates, into two main buckets Republican vs. Democrat. The party value was determined from Wikipedia articles and merged onto the main data set.

  • Swing State (swing_st) - In USA presidential elections, the president is actually elected by an Electoral college of state representatives who cast a preallocated number of electoral votes on behalf of their state. In many states, the full block of electoral votes must go to the winning candidate in that state. If these states have almost equal Democratic and Republican support, they can be the one state that “swings” the election. In 2012 there were 9 swing states as tracked by the New York Times

  • Categorized Contribution Amount (contb_receipt_amt_category) - The charts show distinct levels of contribution at these breaks: $25, $50, $100, $250, $1000, and $2500. A categorized variable was added to facilitate analysis.

  • Receipt year month (contb_receipt_ym) - The Year month of the receipt date to simplify trend plotting.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

For more detail, see the data munging documented file:AndrewLaversCampaignMunge.html

The main operations performed where:

  • Remove negative values that represent returns. Strictly we should also find the original contribution and remove that as well.
    Finding the original is made more difficult by sampling. But since returns make up about 3% of the data set, this was left for a future exercise. Negative return values distort the means and medians

  • Limit to contributions of $2500. This eliminated a few very large party committee transfers, many smaller corporate contributions and leaves a more consistent data set of individual contributions only.

  • Categorized the contribution amount into buckets

  • Eliminate “states” that are territorial possessions etc., that do not form the Electoral College.

  • Limit to dates after 1/1/2011

  • Eliminate the Green Party because there are very few contributions


Bivariate Plots Section

Parties vs. States

Here we look at some of the relationships between parties and states.

As can be seen in the above chart, there is not much difference in individual contribution amounts in Swing or Non-swing states. The median Democratic Swing State contribution of 45 differs slightly from the 50 we find in Non-swing States. The Republican median of 100 is unchanged.

A SQRT scale is needed in the above chart to show the distribution. Clearly the populous states such as NY, CA, TX, FL, and IL dominate in election contributions.

A cursory review of US State populations suggests that the above chart corresponds approximately to state populations. Virginia, a swing state, may be an exception being 7th in contributions but only 12th in population.

Thus swing states don’t seem to make an obvious difference.


Contributions by Candidate

The general election candidates, Mitt Romney and Barack Obama will obviously raise more because they are in the race to the end. These next charts investigate the differences between candidates, first by contribution size and then by total amount.

The above chart shows significantly higher median contribution amounts for the primary candidates. Rick Perry’s median of 1500 is similar to that of Timothy Pawlenty and much higher than the rest of the field. Pawlenty has far fewer overall contributions so this similarity here may be deceiving. The difference in distribution between Mitt Romney and Barack Obama can clearly be seen.

This result above is interesting to me because I have always wondered how much money was “wasted” during primary elections. In the chart you can see that both Mitt Romney and Barack Obama received orders of magnitude
more that the other primary-only candidate.


Contributions over time

We will now investigate contributions over time and how they relate to other factors.

In the above chart contributions accelerate as the election approaches, and stop immediately after. There is distinctly earlier contributions to the Republican Party between 8/20111 and 4/2012. The Democratic Party ends very strong with noticeably higher contributions.

In the months after the primaries, total contributions are similar until September when the Democratic Party jumps ahead by $2million. This lead is then maintained. The Democratic Party convention was held September 3-6, which could be a trigger for this additional contribution. Note that because of sampling at 5% this gain would be about $40 million.

The Democratic Party leads dramatically in number of contributions after the end of the Republican Primaries- more than 100,000 in this 5 sample

In the above chart the Democratic acceleration is clear with the ratio of cumulative contribution rising rapidly to more than 4:1. This is perhaps one of the clearest conclusions of the analysis.


Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

A clear relationship is between contribution amounts and contribution size which varies by party. The Democratic Party is the clear leader by contribution counts and the Republican party leads in contribution size. As may be expected contributions accelerate closer to the election. An interesting note is the so-called “convention bump”. The press and attention around the national convention attracts more interest and contributions. This bump in September 2012 can be clearly seen for the Democratic Party.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

There seem to be some similarities between the primary candidates who ultimately dropped out in the primaries. These can be seen in the box plot by candidate above. In particular the contribution size median is much higher suggesting that in order to progress in a primary election, the candidate must be supported by large contributions. In the next section we will explore this further and one of the final plots will show this relationship.

What was the strongest relationship you found?

The strongest relationship is between party and contribution count and size. Both parties raised similar total amounts.
The Democratic Party contribution count was about four times that of the Republican Party.


Multivariate Plots Section

Cumulative Contribution by Party

The next charts focus on cumulative sums to investigate the rate of contributions over time. The slope of a cumulative chart indicates the rate of change.

Firstly, a chart to investigate which contribution size dominated in each party and how that varied over time

The above area plot reveals significant differences between Republican and Democratic donation patterns over time. The Republican Party reached $10 million in May 2012 with about $6 million coming from the top category. The Democratic Party reached the $10 million mark two months later with almost even contribution totals in the top three categories. The Republican chart is dominated by the top contribution category.

Next, we are interested in the rate of contributions, how fast they grew, when they started and when they ended up. We will omit Mitt Romney and Barack Obama because their totals are much greater than the other candidates.

The faceted plot above is interesting because it shows very different shapes for non-starters and those that remained competitive. We will improve this for the for the final plots


Contribution Size by State

Does residing in a swing state influence the contribution size? The next charts explores relationships with the state of residence.

There must be some good information in the state and geography data, but plots like this are not very meaningful probably because the variation in state population, and hence the number of contributions per state, dominates. Perhaps it would be better to focus on means and/medians.

This chart overwhelmingly shows the broad Democratic base across all states, because both mean and median are relatively flat and the median is very close to the median. By contrast The Republican contribution size mean is substantially higher than the medians, with substantial variation across the states.

While there is substantial variation in mean contribution size by state, the median is the same in about 80% of all states.

Occupations

We will take a look at occupations. And see if there is some relationship. There are 31099 different occupations listed in this sample so let’s look at the top occupations by frequency. First a chart of counts, then a chart of amounts because we have seen these differences before.

The above chart shows the differences in percentage of contributions by occupations. Positive percentages are Republican - Negative percentage are Democratic. The occupation order is Democratic favoring to Republican favoring.

Next, contribution amounts

The above chart shows the differences in $ amounts of contributions by occupations. Positive are Republican - Negative are Democratic. The occupation order is the same as the previous chart, Democratic favoring to Republican favoring by count.

A few observations:

  • The largest contribution total difference are from Professors, Attorneys, Retired, and Homemakers
  • Fewer occupations dominate the Republican contributes while the Democratic occupations are more varied. 18 of the 30 occupations have more dollars contributed to the Democratic Party.
  • Comparing to the previous chart, the crossover from Democratic to Republican is much higher, which reflects the much larger contribution amount prevalent with the Republican party.

Compare Occupations by State

Do retired people contribute to different parties depending on their state of residence? This is the kind of question we will explore next.

This chart turned out to be not very meaningful. The intent was to chart the change in party leaning by Occupation by State. By ordering the axes this yields a predominantly blue Democratic presence top-left and red Republican presence bottom-right. The state size influences the vertical bands which distorts the information.

A few useful things can be seen. Retired people are heavy contributors and split equally between the parties in all states. Homemakers are clearly more likely to contribute to the Republican party regardless of state. Professors and Educators are overwhelmingly Democratic across all states, although their contributions vary in size. ```


Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

The plots in this sectioned strengthened the idea that the size and number of contributions are very different for the Republican vs. the Democratic party.

The final presidential candidate totals tower over the other candidates. For example, Barack Obama total receipts are 5.6 times that of the total of all Republican candidates excluding Mitt Romney.

Were there any interesting or surprising interactions between features?

Because swing states are pivotal in the election, I expected to find evidence of this. I quickly realized, however, that this can only be analyzed in the context of the state itself. States vary greatly in population and in per capital income. Perhaps these factors could be used in a future study to normalize the comparisons between states.


Final Plots and Summary

Plot ONE - How did fundraising progress for the Primary candidates?

This chart explores the growth – cumulative contributions – by month and year for the Republican participants in the presidential primary and omits the general election contenders, Mitt Romney and Barack Obama. The line width indicates the number (rate) of contributions. The original chart was hard to follow with colors - it took quite some time to figure out how to plot the names close to the lines.

This chart focuses on the Republican primary candidates who were real Contenders, raising $200,000 or more versus the Non-contenders that never made much headway. Although the $200,000 limit is somewhat arbitrary it does separate these groups into distinct shapes.

The very interesting pattern on the left suggests candidates need a bump or surge in contributions to gain traction and stay in the race. Note that Ron Paul, the most successful, is a little different, starting early and having a more sustained climb rather than a sudden burst.

Rick Perry’s rapid rise in contribution amounts from fewer contributors can be seen from the relatively thin line. This suggests he may have been fueled by wealthy contributors but unable to continue that into a sustainable contribution base as can be seen with the other long-lived candidates

Ginrich and Santorum started later but grew steadily, leveling out a little earlier than Ron Paul.

Ron Paul achieed his leadership wth far smaller contributions as can be seen from the width of the line that indicates a larger number of contributions.

It’s also interesting to note that the slope of the rapid ascent period is very similar for most of the candidates, until the point that they somewhat suddenly flatten.


Plot TWO - Comparison of contribution size

This analysis has shown that there clearly are differences in the size and number of contributions by party even though both parties collected nearly similar total amounts. These charts place this stark difference side-by-side for easy comparison.

From the left side chart above the difference in contribution size is very clear. The Republican Party received 52 percent of all funds from contributions in the range $1,000 to $2,500. By contrast the Democratic Party received only 23.3 percent of all contributions from that category. In addition the Democratic Party received more than 50% of all funds from contributions of $250 and under.

From the right side chart, which shows the counts, the broad base of the Democratic Party is clear. About 60% (35% + 25%) of Democratic Contributions are $50 or under.


Plot THREE- Democratic vs. Republican Contribution Pattern

The total contributions for both parties were very similar with the Republicans raising 100.4 % of the Democratic total.

Earlier exploratory charts suggested that the number and size of contributions are quite different between the two parties. The chart below extends this by exploring the timing, size and count of contributions for the two parties. Each contribution is plotted as a point in the traditional colors of the parties - Blue for the Democrats and Red for the Republicans. Choosing a very low alpha allows the distribution of the contributions to show through. Setting the size to be the contribution amount ensures that the color density overall reflects the relative value of each contribution. The horizontal bands are the size categories of the contribution. Time flows left to right,

This chart is quite striking and describes the differences between the Democratic and Republican fund raising. There are a few clear patterns:

  • Republicans make more donations above $250 and clearly lead the $1000 - $2500 category.

  • Democratic contributions start earlier and seem more dominant in the earlier stages except perhaps for December 2011 and January 2012. This earlier start is curious since there was no primary competition for the Democratic nomination.

  • The number of contributions in the few months before the election in November, is dominated by Democrats, as can be seen by the intense blue in May, June and July 2012.

  • In October, one month before the election, Republicans seem to suddenly increase the number and size of their contributions as can be seen from the intense red.


Reflection

This analysis took longer than expected because I felt compelled too look at many angles. In doing so I learned many aspects of R and ggplot that I otherwise wouldn’t have.

Struggles

  • Dataset size - The size of the complete data set, as warned by our instructors, slowed down the analysis, but it progressed more quickly once a sampling approach was taken. After a few samples matched the overall distribution well, the analysis proceeded with a 5% sample. (See the data munging document for details).

  • stats and ggplot - I struggled with the use of stats with ggplot, and some review of those are definitely needed. However, in the later stage I used the dplyer group_by, summarize “pipeline”. Creating a result set with the dplyr functions makes it much easier to verify the results in table form before before plotting. Sums and totals can be checked interactively.

  • Raw dataset - As warned by the instructors, figuring out the differences in the raw data took time, but was still interesting.

Lessons Learned

  • Perpetual tweaking - “All results are interpretive so I found a continuous need to explore further. The results seemed that they could be refined and code could be continuously improved. Similarly, continuously tweaking the plots to adjust for minor formatting is very time-consuming, but essential to achieving visual clarity and simplicity.

  • Dates in many formats - The usage of dates as months proved to be somewhat messy and I am not very comfortable with the end result. Most of the manipulation of dates is by month, which was represented as a yyyy-mm string, for example, ‘2012-08’. Although this worked reasonably well, it is treated as a factor and none of the specific continuous date scales can be used. In order to compare dates and use date scales, I assigned the first day of the month and used lubridate ymd function to create dates of type POSIXct. However, the first day of the month, such as “2012-08-01”, which is really “2012-08-01 00:00:00 UTC” results in “2012-07-31 19:00:00” when used by the date scale functions which operate in local timezone.

    This can be seen in

       library(lubridate)
       strftime(ymd(paste("2012-08","01", sep="-")), "%F") 
    ## [1] "2012-07-31"

    This was resolved by setting the date to the fifteenth of the month, which may be better because then the values align at about the middle of the month on the continuous scales.

    It seems that almost every language I know has multiple date manipulation libraries each with drawbacks and none having really solved all the problems with dates.

Successes

  • Broadbased contributions -There is clear evidence in this data of the broader-base of contributions to the Democratic Party. While I understood this broadly I didn’t know that this holds across almost every state.

  • Primary candidates -The shapes of the contributions in the primary period, as shown in Plot 1, is particularly revealing. This certainly suggests that success in the primary could be predicted from the rate and size of contributions.

  • Occupation -The variation by occupation is particularly interesting because this invokes a whole set of different personal questions such as: Why do most writers contribute small amounts to the Democratic Party?

Data restrictions, limitations

  • Sampling - Using a sample has precluded analysis by employer or individual and has limited the extent of analysis by occupation. It’s possible that some of the state-specific counts by occupation may be very small in the sampled set and hence invalid.

  • Business contributions - These contributions, with higher limits, were excluded to simplify and focus the analysis.

  • Spousal contributions - The data set includes transactions that exceed $2,500 and corresponding “transfers from spouse”. This analysis does not handle these properly.

  • Returns - Campaign returns are excluded completely. We did estimate that returns only made up about 3% of the data set so the error of including a contribution but omitting the return in this 5% sample is ~ 3% * 5% = 0.15%

Further Analysis

Key areas that could be pursued are:

  • Swing States - Further analysis for swing states may be interesting, relating contributions to population and income.

  • Business contributions - Company contributions and contributions by employees may be very significant in total. It would be interesting to find employers with heavy pockets of contributions.

  • Contributions to multiple candidates -Do people contribute to the candidate they believe in or do they contribute to multiple candidates?

  • Swing states -Further analysis of swing states will require eliminating the effect of state population. I would be interested in finding occupational differences in swing states that may indicate sets of people that become more active, the closer the race.

Broader Meaning

In the 2012 election year, the Citizen’s United decision cleared the way for much money to be contributed by corporations to PAC’s and Super PACS (political action committee’s), so this analysis only covers a small portion of the money involved in presidential elections.

Overall this analysis has reinforced the idea that the Democratic Party is more broadly based relying on smaller contributions from a broader base than the Republican Party.